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Abstract 



In studying college effects, an input-output model is commonly used in 

/ 

which 'student input is controlled by using regression analysis to compute 
an "ejcpected" output. The part correlation of the college environment vari- 
able and the output with input variance removed only from the output is 
inte:'rpreted as a measure of the college effect. However, this is not the 
most useful procedure that may be used since part (or partial) correlation 
may severely underestimate the magnitude of the true college effect. In- 
terpreted within a causal model, partial regression coefficients appear to 
be a generally more satisfactory measure of college effects. Four models 
are used to illustrate the advantages of using partial regression coef- 
ficients in a causal framework. Another advantage in using these coef- 
ficients is that they have greater stability across different units of 
measurement . 
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A commonly used procedure in studying college effects involves an input- 
output model in which student input is controlled by using regression analy- 
sis to compute an "expected" output (e.g., Astin, 1963 , 19 ^ 4 ; Thistlethwaite 
and Wheeler, I966), The correlation of a school environment variable with 
th'j residual output (i.e., actual minus "expected" output) is interpreted as 
a measure of the school’s influence on the output. Although sometimes la- 
beled a partial correlation, it is more accurately described as the part 
correlation (McNemar, 19 ^ 2 , p. 167) of the school with the output variable 
when the influence of the input variables has been removed from the output. 

A potentially serious interpret at ional problem is that part correla- 
tions may severely underestimate the magnitude of the true college effect. 
This possibility was noted previously by Richards (1966): 

"suppose that a real effect of small colleges is to encourage 
students to develop warm personal relationships with the faculty, 
and that the socio-economic status of college students has no 
inherent relationship to their tendency to develop warm relations 
with the faculty. Suppose further that there is a strong tendency 
for small colleges to attract rich students. Over a sample of 
colleges varying in size, the tendency of rich students to attend 
mainly small colleges will produce a positive correlation between 
socio-economic status and developing warm relations with the 
faculty, but the correlation between college size and developing 
warm relations will not be increased by the fact that small colleges 
attract rich students. Consideration of the basic formula for 
computing partial correlations makes it clear that, in these cir- 
cumstances, controlling for differences in socio-economic status will 
tend to reduce the correlation between college size and the extent 
to which students develop warm relations with the faculty, and 
therefore to obscure the true causal relationship (p. 381)." 

The logic of Richards' argument applies equally to part and partial 
correlations. How then should the problem of part correlations under- 
estimating the size of the college effect be resolved? As Richards presents 
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the problem, a researcher seems to have two alternatives: he either controls 

for student input characteristics or he does not. Astin (1968) rejected 
Richards' implication that, given these alternatives, it might be better not 
to control for student input: "As long as the student is used as the unit 

of analysis in the control of input characteristics, any environmental ef- 
fects... will not be 'obscured' by the statistical adjustments for input dif- 
ferences that are made in regression analysis or actuarial tables. It is 
true that the actual magnitude of the effect may be underestimated somewhat, 
but this is a necessary consequence of the partial confounding of student 
input and college environmental variables (p. 430)," However, even a moder- 
ate degree of underestimation may seriously obscure the college effect be- 
cause the effect is likely to be relatively small and fragile across a wide 
sample of colleges. Only a small association attributable to the college 
influence is expected because: (a) students usually enter college with 

relatively stable attitudes and skills; (b) a single college variable seldom 
measures more than one aspect of the total college effect; and (c) any one 
aspect of the college may affect only a limited number of students. 

The work of Blalock (196O, I961, 1964, I965, 196?) and Tukey (1954) 
indicates that a partial regression procedure is superior to part (or partial) 
correlation because controls for input may be introduced without underesti- 
mating the magnitude of the college effect. Their argument emphasizes the 
inherent need to interpret all statistics within a theoretical model that is 
relevant to the problem studied. The advantages of using regression coef- 
ficients, rather than part (pr partial) correlations, to study college ef- 
fects will be evaluated from the standpoint of four hypothetical models of 
"reality." 

Model I 

The situation presented by Richards is one that involves a developmental 
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sequence of the form A— ►B— ►C. The variables corresponding to A, B, and C 
are socioeconomic status (SES), size of the college (SIZE), and warmth of 
the student relationship with the faculty (WAHVITH). Specifically, the 
following relationships are implied: (l) SES directly affects SIZE, i.e., 

affluence influences the size of the college a student attends; (2) SIZE 
directly affects WAEMTH, i.e. smallness produces warmer student -faculty re- 
lations; and ( 3 ) SES influences WAIMCH, only ^directly through the mediating 
variable, SIZE. This model is shown in Figure 1. 




Fig. 1 Input variable (SES) influences the 
college environment variable (SIZE), which in turn 
influences output (WAEMTH). 

In order to analyze these relationships in a causal model, it must be 
assumed that variables outside the system do not directly affect more than 
one of the three variables included. In essence, this assumption ensures 
that outside variables do not affect the correlations among SES, SIZE, and 
WAIMTH. If it is known that an outside variable does influence more than 
one of the variables included, that variable should be brought into the 
causal model. 

One of the advantages of using regression coeffieients instead of part 
corr*- J.ations in the A— ►B — ►C model is this : a control for A ordinarily re- 

duces the magnitude of the partial correlation r , although a control for 
A does not affect the expected value of the corresponding regression coef- 
ficient, b . In order to illustrate this point, let us assume that the 
strengths of both the SES-SIZE and the SIZE-WARMTH relationships are 



completely nonspurious correlations of +. 5 O, and that all variances equal 
unity. Since assumption (3) necessarily (Simon, 195^) implies a zero partial 
correlation (r^Q -q) of SES with WAEMTH when SIZE is controlled, the formula 
for partial correlation can be used to calculate the zero order correlation 
(rnp) of SES with WAEMTH. The zero order correlation, in turn, can be used 
to calculate the part correlation of SIZE with WAEMTH when the 

influence of SES is removed from WAEMTH as shown below. 
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where A = SES 
B = SIZE 
C = WAEMTH 



(3) ^AC " ^AB^BC " *50 X .50 - .25 
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'^(1 - . 25 ®) 



= .387 



Eemoving the influence of SES from WAEMTH reduces the correlation of 
SIZE with WAEMTH from .50 to .39* In Model I the part correlation (.39) 
clearly underestimates the true strength (.50) of the SIZE-WAEMTH relation- 
ship. If additional input variables not directly influencing the output were 
partialled out of the output, it would be expected that the part correlation 
might become even smaller. The relative reduction would depend upon the 
strength of the relationship between the input and the output variables 
(Blalock, 1964 ). It appears that the college effect is likely to be under- 
estimated in the typical college effects study because many input variables 
usually are controlled. The corollary is that when a number of student in- 
put variables are controlled a small part correlation may not imply a small 
college effect. 

On the other hand, Blalock (19^4) has shown that regression coefficients 
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will provide a more interpretable estimate of college effects than will part 
or partial correl 8 .tion. In Model Ij the zero order regression coefficient 
(bp-o) for estimating WARMTH from SIZE is .50 (bQg = 4 Og). Framed 

'W’ithin this causal modelj the coefficient signifies that if the size of the 
QQ]_ 3 „ 0 g 0 decreases one size unitj then the warmth of student~faculty relation™ 
ships will increase one-half warmth unit. The regression coefficient is a 
measure of the SIZE-WARMTH relationship that is interpretable in an if- then 
sense (if SIZE changes, then WARMTH will change in a determinate way ) 5 and 
it represents a hypothetical measure since it does not indicate how much 
SIZE actually changes. In practice, the researcher must give persuasive 
reasons for supposing a particular regression coefficient to be a measure of 
a particular if-then relationship. With SES controlled, the partial regression 
coefficient of WARMTH on SIZE is equal to the zero order regression coefficient 

“ * 50 )** 



^BC - ^AC^AB 



cr 



C .50 - .25(.50) 



'CB.A 



1 - r" 



AB 
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1 - (. 50)2 



1.0 

— = .50 

1.0 



Thus the partial regression coefficient is numerically equal to the true 
relationship. The size of the regression coefficient b^^^^ in Model I is 
not affected by controls for an antecedent input variable that does not 
directly influence the output. 

Thus in a developmental sequence such as that shown in Model I, i.e. 

^ — ►c, Richards' criticism of part or partial correlation as an esti- 
mate of the college effect is valid; however, his criticism does not apply 
to regression coefficients. The use of partial regression coefficients 
avoids ascribing to the college effect variance that may largely be due to 
input (Astin, 19^3) « 



Regression coefficients are advantageous to an understanding of causal 
relationships because their behavior can be compared more safely than the 
behavior of correlation coefficients (Tukey, 1954^ Blalock, 1964). Thus thu 
mere reduction of a partial correlation is difficult to interpret. As Bla- 
lock (1961) noted: "The numerical value of a correlation coefficient may be 
reduced not only because a confounding influence has been controlled, but it 
ma,y also be altered because -re have decreased the total variation in the 
independent variable relative to that in other causes of the dependent 
variable (p . 87 ) . " 

In Model I, it can be shown, for example, that when SIZE is controlled 
the partial regression coefficient of WARMTH on SES is zero, the same as the 
"true" relationship: 



^AC ” ^AB BC 



CA.B 



1 - r' 



A*R 



.25 - .50(.50) 1.0 

1 - (.50)^ 1.0 



= .00 






Therefore, one can correctly deduce from this coefficient that SES does not 
directly influence WARMTH. In other words, granting the assumptions about 
linearity and outside variables, if there were a three-variable sequence in 
which A were antecedent to B, and A and B antecedent to C, and the regression 
coefficient of C on A with B controlled turned out to be zero, one could 
reasonably deduce that the total influence of A on C was mediated through 
variable B. It could be concluded, therefore, that the association of A with 
C in Model I is not spurious but results from the indirect (i.e., mediated) 

influence of A on C via B. 

Model II 

Although the part correlation yields misleading results if the true 
model is like Model I, part correlation, partial correlation, or partial 
regression coefficients will lead to correct deductions about the college 
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effect if the true model is like Model II. In this model, input influences 
both the college environment variable and the output, but the college itself 
does not influence the output. 




Fig. 2‘ The input variable influences 
both the college environment and the output 
variable. 



For the fictitious data in Figure 2, the correlation of the college with 
the output variable can be calculated since the partial correlation of col- 
lege with output (input controlled) equals zero (Simon, 1954): 



^BC.A 



= 0 = 



^BC “ AB^AC 
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= - 5 ° -50 = -25 
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The part correlation (r^^^ of college with output when the influence of 
input is removed from output will, like the partial correlation (^^-gQ ^) 
the partial regression coefficient ^)j be zero. 

However, the use of partial regression when attempting to build a com- 
plete causal model would lead to more accurate conclusions than would part 
correlation. In Model II, for example, the partial regression coefficient of 
output on input with the college variable controlled is arithm^etically identi- 
cal to the zero order regression coefficient of output on input. 



(1) b 
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(4) and since 






(5) '^CA.B '^CA 

Whereas the paxtial regression coefficient leads to the correct conclusion 
that no part of h^^^ is spurious, the corresponding difference between r^^ 

and the part correlation meaningless. 

Model III 

In an actual college effects study it is sometimes more reasonable to 
expect Model III (Figure 3) , which is a combination of Models I and II. In 
Model III student input characteristics have direct influence on both the 
college environment variable and the output, and the oo .lege, in turn, has 
some influence on the output. 




Fig* 3 Input variable influences both college 
environment and output variable; college variable 
also influences output. 

For the fictitious data in Figure 3, the partial regression coefficient 
of output on college with input controlled is; 

- "‘AC^'AB _°C •'i'5 - -75(-50) 

b^.„ « = ^ , ^2 



= .50 



CB.A 



1 - r" 



AB 



B 



1 - (.50) 



The use 



of part correlation, however, would overestimate the college effect 
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(.50) in this example. With student input characteristics removed from the 
output, the obtained part correlation is .57: 



.79 - . 79 (. 90 ) 

"b(c.a) = . ..,52 



.567 






The partial regression coefficient of output on input with the college 



variable controlled is again easily interpreted; 

.75 - .75(.50) 



^AC ■ ^AB^BC °'C 



'CA.B 



1 - r' 



'AB 



1 - (.50)2 



= .50 



Further light can be shed on Model III by interpreting the correaltions 
in terms of path coefficients (Duncan, I966). In path analysis the zero 
order correlation of the college variable with the output (r^^ = .75) in 
Model III consists of two parts: the association due to the direct influence 

of the college on the output, and some spurious association due to the common 
antecedent factor, student input characteristics. The correlation of input 
with output also consists of two parts: the association due to the direct 

influence of input on output, and the association due to the indirect, influ- 
ence of input on output mediated through the college variable. The spurious 
component in r^^ is equal to (r^^ - b*^ ^) where b*^ ^ is the standardized 
partial regression coefficient. The component of r^^^ ascribed to the direct 
influence of the college on the output is b^^^ ^ (numerically equal to b^ ^ 
only because unit variances were assumed). That part of r ascribed to the 

direct influence of input on output is equal to b* ; whereas the part due 

GA..B 

to the indirect influence of input on output mediated through the college 
variable is equal to (r^Q - The equations with standardized partial 

regression coefficients are the "normal" equations of variance analysis: 



= h* + h* 

•BC CB.A “^CA.B ^AB 



= .50 + .50(.50) 



.75 
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^AC " ^CB.A ^AB ^CA.B " -50(.50) - .50 - .75 

The calculations shown are those used to compute and for the Model 

III example, which is a combination of the examples used in Figures 1 and 2. 

Model IV 

Often the investigator may not be justified in ascribing the input- col- 
lege correlation solely to the influence of input on the college variable, as 
was assumed in Models I, II, and III. When this assumption is not warranted. 
Model IV (Figure 4) results; the double-headed arrow in Figure 4 indicates 
that the college and input variables are correlated for unknown reasons. For 
the fictitious data in Figure 4, the same partial regression coefficients 

are found as were previously calculated in Model III: b = .50 and b 

CB.A CA.B 

= .50. When input and college varialsles are correlated because they es- 
sentially measure the same underlying factor, any interpretation of b^ ^ 

and b . is unwarranted without further assumptions. 

Gn.»B 




Fig. 4 Both the college and the input variable 
influence output; college and input variables corre- 
lated for unknown reasons (indicated by curved arrow). 

Models III and IV can be distinguished by examining the normal regression 
equations for r^^ and r^^. In Model III the difference (3^*-gQ - b^^ ^ ) is 
interpreted as spuriousness because input is antecedent to both the college 
and the output variables, whereas in Model IV this difference is uninterpretable 
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because the causal relationship of input to college is unknown. In Model 

III the difference (r^^ - g) is evidence of the indirect effect A ► 

B — ►C, whereas in Model IV this difference cannot be meaningfully interpre- 
ted. The point is this: in Model IV only the independent influence of 

college and input (as measured by the regression coefficient) is interpret- 
able; the joint influence of A and B on C cannot be interpreted in causal 
terms. In Model III, however, the joint AB influence is ascribed to input. 

A generalized version of Model IV, in which standardized regression coef- 
ficients are used to compute the various components of the predictable vari- 
ance was provided by Werts (1968). 

Overview 

Although part correlation is commonly used to study college effects, 
it m^y not be the most effective statistical procedure. For the four hypo- 
thetical models discussed, part correlation (i.e. the college environment 
variable with the output when the influence of input is removed from the 
output) correctly estimated the size of the college effect only in the trivi 
al case of a zero college effect. On the other hand, partial regression 
coefficients appeared to be a generally more satisfactory measure of college 
effects . 

Typically, college effects studies have not attempted to determine the 
causal relationships among variables, l^en causal relationships are not con 
sidered, however, the investigator usually lacks the framework he needs to 
interpret his results correctly. For example, it is common practice to par- 
tial all student input variables out of the output before correlating the 
available college environment measures with the residual output; any of the 
obtained part correlations that reach statistical significance are inter- 
preted as evidence of college effects. If the "true" situation is like 
Model II and III, a zero part correlation means there is no college effect. 
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However, if the situation is like Model V (Figure 5)j ^ zero part correlation 
means that the influence of college on output is mediated through the input 
variable. Since there are many other cases in which any interpretation of 
correlation or regression coefficients is unwarranted, the investigator must 
be able to show why his model is reasonable. 




Fig. 5 College variable influences input 
variable; input influences output 



It would not seem wise, therefore, to adopt what might be termed a 
"shotgun" correlational approach to the study of college effects. The 
phenomenon is too complicated for reliance on such a blind procedure; and 
there is too much risk that incorrect interpretations will be made of the 
data. 

A major reason that regression analysis appears more suited than corre- 
lation to the study of college effects is that regression coefficients are 
potventially more stable. Tukey observed that: "We are very sure that the 

correla.tion cannot remain the same over a wide range of situations, but it 
is possible that the regression coefficient might (l95^j P» ^l)» For ex- 
ample, Blalock (1961) pointed out that as one shifts units of measurement, 
e.g., from individual to c3.ass to school, the regression coefficient remains 
relatively stable, whereas the correlation coefficient usually increases 
markedly in a way that makes it hazardous to draw conclusions about individu- 
als from correlation on grouped data (Robinson, 1950). Thus the stability 
of the regression coefficients makes it more appropriate for college effects 




research because, although often dealing with grouped data, such research 
frequently hopes to draw inferences about effects on individuals. 

A question crucial to college effects studies concerns the analysis | 

of multiple input or college variables with or without measurement error 
(Blalock, 1965). However, this problem is too complex to discuss here; 
this paper is intended only as an introduction to the use of structural 
equations (for more advanced treatments see Johnston, 19^3; Wold and Jureen, 

1953). 

Consideration of the relative merits of correlation and regression coef 
ficients for the study of college effects should not be construed as a re- 
jection of the college effects studies conducted so far. The use of re- 
gression coefficients, framed within a causal model, may simply provide a I 

more sensitive test of that model. The really pressing need is for more | 

valid testable models. I 
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